Generalizing Sub-sentential Paraphrase Acquisition across Original Signal Type of Text Pairs

نویسندگان

  • Aurélien Max
  • Houda Bouamor
  • Anne Vilnat
چکیده

This paper describes a study on the impact of the original signal (text, speech, visual scene, event) of a text pair on the task of both manual and automatic sub-sentential paraphrase acquisition. A corpus of 2,500 annotated sentences in English and French is described, and performance on this corpus is reported for an efficient system combination exploiting a large set of features for paraphrase recognition. A detailed quantified typology of subsentential paraphrases found in our corpus types is given.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Etude de la paraphrase sous-phrastique en traitement automatique des langues. (A study of sub-sentential paraphrases in Natural Language Processing)

Language variation, or the fact that messages can be conveyed in a great variety of ways by means of linguistic expressions, is one of the most challenging and certainly fascinating features of language for Natural Language Processing, with wide applications in language analysis and generation. The term paraphrase is now commonly used to refer to textual units of equivalent meaning, down to the...

متن کامل

A contrastive review of paraphrase acquisition techniques

This paper addresses the issue of what approach should be used for building a corpus of sentential paraphrases depending on one’s requirements. Six strategies are studied: (1) multiple translations into a single language from another language; (2) multiple translations into a single language from different other languages; (3) multiple descriptions of short videos; (4) multiple subtitles for th...

متن کامل

Validation of sub-sentential paraphrases acquired from parallel monolingual corpora

The task of paraphrase acquisition from related sentences can be tackled by a variety of techniques making use of various types of knowledge. In this work, we make the hypothesis that their performance can be increased if candidate paraphrases can be validated using information that characterizes paraphrases independently of the set of techniques that proposed them. We implement this as a bi-cl...

متن کامل

Une étude en 3D de la paraphrase: types de corpus, langues et techniques (A Study of Paraphrase along 3 Dimensions : Corpus Types, Languages and Techniques) [in French]

A study of paraphrase along 3 dimensions : corpus types, languages and techniques In this paper, we report a detailed study of the impact of corpus type on the task of sub-sentential paraphrase acquisition. Our experiments are for 2 languages and 4 corpus types, and involve an efficient machine learning-based combination of 4 paraphrase acquisition systems. We obtain relative improvements of mo...

متن کامل

Automatically Constructing a Corpus of Sentential Paraphrases

An obstacle to research in automatic paraphrase identification and generation is the lack of large-scale, publiclyavailable labeled corpora of sentential paraphrases. This paper describes the creation of the recently-released Microsoft Research Paraphrase Corpus, which contains 5801 sentence pairs, each hand-labeled with a binary judgment as to whether the pair constitutes a paraphrase. The cor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012